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Abstract 



We propose a novel method to embed a functional magnetic resonance imaging (fMRI) dataset in a low-dimensional 
space. The embedding optimally preserves the local functional coupling between fMRI time series and provides a 
low-dimensional coordinate system for detecting activated voxels. To compute the embedding, we build a graph of 
functionally connected voxels. We use the commute time, instead of the geodesic distance, to measure functional 
distances on the graph. Because the commute time can be computed directly from the eigenvectors of (a symmetric 
version) the graph probability transition matrix, we use these eigenvectors to embed the dataset in low dimensions. 
After clustering the datasets in low dimensions, coherent structures emerge that can be easily interpreted. We 
performed an extensive evaluation of our method comparing it to linear and nonlinear techniques using synthetic 
datasets and in vivo datasets. We analyzed datasets from the EBC competition obtained with subjects interacting in 
an urban virtual reality environment. Our exploratory approach is able to detect independently visual areas (V1/V2, 
V5/MT), auditory areas, and language areas. Our method can be used to analyze fMRI collected during "natural 
stimuli" . 
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1. Introduction 

At a microscopic level, a large number of internal 
variables associated with various physical and phys- 
iological phenomena contribute to dynamic changes 
in functional magnetic resonance imaging (fMRI) 
datasets. FMRI provides a large scale (as compared 
to the scale of neurons) measurement of neuronal 
activity, and we expect that many of these vari- 
ables will be coupled resulting in a low dimensional 
set for all possible configurations of the activated 
fMRI signal. We assume therefore that activated 
fMRI time series can be parametrized by a small 
number of variables. This assumption is consistent 



with the usage of low dimensional parametric mod- 
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els for detecting activated voxels (Petersson et al. 
1999). This assumption is also consistent with the 



empirical findings obtained with principal compo- 
nents analysis (PCA) and independent components 
analysis (ICA) flB.B.Biswal and Ulmer[ [1999} |McK- 



eown et ah] 2QQ3| ), where a small number of compo- 
nents are sufficient to describe the variations of most 
activated temporal patterns. Both PCA and ICA 
make very strong assumptions about the compo- 
nents: orthogonality and statistical independence, 
respectively. Such constraints are convenient math- 
ematically but have no physiological justification, 
and complicate unnecessarily the interpretation of 



the components (Friston, 1998). A second limitation 



of PCA and ICA is that both methods only provide 



a linear decomposition of the data (Friston, 2005). 



There is no physiological reason why the fMRI signal 
should be a linear combination of eigen-images or 
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eigen-time series. In practice, the first components 
identified by PC A are often related to physiological 
artifacts (e.g. breathing), or coherent spontaneous 



ity (Raichle and Mintun, 2006). We call these time 



fluctuations (Raichle and Mintun 2006). These ar- 



tifacts can be responsible for most of the variability 
in the dataset. Stimulus triggered changes, which 
are much more subtle, rarely appear among the first 
components. 

The contribution of this paper is a novel ex- 
ploratory method to construct an optimal coordi- 
nate system that reduces the dimensionality of the 
dataset while preserving the functional connectiv- 



ity between voxels (Sporns et al. , 2000). First, we 



define a distance between time series that quan- 



tifies the functional coupling (Fox et al. , 2005), 



or connectivity between the corresponding voxels. 
We then construct an embedding that preserves 
this functional connectivity across the entire brain. 
After embedding the dataset in a lower dimen- 
sional space, time series are clustered into coherent 
groups. This new parametrization results in a clear 
separation of the time series into: (1) response to a 
stimulus, (2) coherent physiological signals, (3) ar- 
tifacts, and (4) background activity. We performed 
an extensive evaluation of our method comparing it 
to linear and nonlinear techniques using synthetic 
datasets and in vivo datasets. 

2. Methods 

2.1. Overview of our approach 

Our goal is to find a new parametrization of an 
fMRI dataset, effectively replacing the time series 
by a small set of features, or coordinates, that facili- 
tate the identification of task-related hemodynamic 
responses to the stimulus. The new coordinates will 
also be able to reveal the presence of physical or 
physiological processes that have an intrinsic low 
dimension. Such processes can be described with 
a small number of parameters (dimensions) in an 
appropriate representation (set of basis functions). 
These time series should be contrasted with noise 
time series that have a very diffuse representation 
in any basis. Example of such low-dimensional pro- 
cesses include task-related hemodynamic responses, 
non-task-related physiological rhythms (breathing 
and heart beating motion). We expect that only 
a small fraction of all time series will have a low- 
dimensional representation. The remaining time se- 
ries will be engaged in a spontaneous intrinsic activ- 



series background time series. As shown in our ex- 
periments, background time series are more com- 
plex, and cannot be well approximated with a small 
number of parameters. 

We now introduce some notations that will be 
used throughout the paper. Let X denote an fMRI 
dataset composed of T scans, each comprised of N 
voxels, which is represented as a N x T matrix, 



x 1 (l) ••• xi(T) 



x N (l) ••• x N (T) 



(1) 



Row i of X is the time series = [a^(l), • • • , xi{T)\ 
generated from voxel i. Column j is the j th scan 
unrolled as a 1 x N vector. In this work, we regard 

as a point in R T , with T coordinates. 

We seek a new parametrization of the dataset that 
optimally preserves the local functional coupling be- 
tween time series. Most methods of reduction of di- 
mensionality used for fMRI are linear: each x$ is pro- 
jected onto a set of components <p k . The resulting 
coefficients < x^, <fi k >, k = 0, • • • ,K—1 serve as the 
new coordinates in the low dimensional representa- 
tion. However, in the presence of nonlinearity in the 
organization of the x^ in R T , a linear mapping may 
distort local functional correlations. This distortion 
will make the clustering of the dataset more difficult. 
Our experiments with in vivo data confirm that the 
subsets formed by the low-dimensional time series 
have a nonlinear geometry and cannot be mapped 
onto a linear subspace without significant distortion. 
We propose therefore to use a nonlinear map \I/ to 
represent the dataset X in low dimensions. Because 
the map \I/ is able to preserve the local functional 
coupling between voxels, low dimensional coherent 
structures can easily be detected with a clustering 
algorithm. Finally, the temporal and the spatial pat- 
terns associated with each cluster are examined and 
the cluster that corresponds to the task-related re- 
sponse is identified. In summary, our approach in- 
cludes the following three steps: 

(i) Low dimensional embedding of the dataset; 

(ii) Clustering of the dataset using the new pa- 
rameterization; 

(iii) Identification of the set of activated time se- 
ries. 
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Fig. 1. The network of functionally correlated voxels, repre- 
sented by a graph, encodes the functional connectivity be- 
tween time series. 

2.2. The connectivity graph: a network of 
functionally correlated voxels 

In order to construct the nonlinear map \I/ we need 
to estimate the functional correlation between vox- 
els. We characterize the functional correlation be- 
tween voxels with a network. Similar networks have 
been constructed to study functional connectivity 
Caclin and Fonlupt , |2006[ 



in (Achard et al., 



2006 



Egui luz et al.| |2005| |Fox et al.| |2005| |Sporns et aT| 
2000 ) . We represent the network by a graph G that 
is constructed as follows. The time series origi- 
nating from voxel i becomes the node (or vertex) 
of the graph 1 . Edges between vertices quantify the 
functional connectivity. Each node x^ is connected 
to its n n nearest neighbors according to the Eu- 
clidean distance between the time series, ||x— Xj || = 

(ELiM*) -xj(t)) 2 ) 1/2 (Fig.Q- The weight W hJ 
on the edge quantifies the functional proxim- 

ity between voxels i and j and is defined by 

" 1 1 x i - x j 1 1 2 1® 2 if x . i s connected to x 7 - 



otherwise. 



(2) 

The scaling factor a modulates the definition of 
proximity measured by the weig ht If a is very 
large, then for all the nearest neighbors Xj of x^, we 
have Wij w 1 (see (J2|), and the transition proba- 
bility is the same for all the neighbors, Pij = l/n n . 



1 We slightly abuse notation here: is a time series, as well 
as a node on the graph. 



This choice of a promotes a very fast diffusion of 
the random walk through the dataset, and blurs 
the distinction between activated and background 
time series. On the other hand, if cr is extremely 
small, then Pij = for all the neighbors such that 
||x^ — Xj|| > 0. Only if the distance between x^ and 
Xj is zero (or very small) the transition probability 
is non zero. This choice of a accentuates the differ- 
ence between the time series, but is more sensitive 
to noise. In practice, we found that the universal 
choice 

<j = n x min ||x^ — Xj || 

i<j 

where nG (0,5] (we used n = 2 for all experiments) 
is usually optimal. 

The weighted graph G is fully characterized by 
the N x N weight matrix W with entries Wij. 
Let D be the diagonal degree matrix with entries 
D iyi = J2j Wij. The spatial coordinates of the vox- 
els i and j are currently not used in the compu- 
tation of Wij. We know that spatial information 
can be useful: truly activated voxels tend to be spa- 
tially clustered. However, spatial proximity should 
be measured along the cortical ribbon, and not in 
the 3-D volume. 



2.3. A new way to measure functional distances 
between voxels 

Once the network of functionally connected vox- 
els is created, we need to define a distance between 
any two vertices x$ and Xj in the network. This dis- 
tance should reflect the topology of the graph, but 
should also be able to distinguish between strongly 
connected vertices (when the voxels i and j belong 
to the same functional region) and weakly connected 
vertices (when i and j have similar time series x^ 
and Xj, but belong to different functional regions). 
The Euclidean distance that is used to construct the 
graph is only useful locally: we can use it to compare 
voxels that are functionally similar, but we should 
use a different distance to compare voxels that may 
not be functionally similar. As shown in the exper- 
iments, the shortest distance 5(x^,Xj) between two 
nodes x^ and Xj measured along the edges of the 
graph is very sensitive to short-circuits created by 
the noise in the data. A standard alternative to the 
geodesic distance is the commute time, ft(x^,Xj), 
that quantifies the expected path length between x^ 
and Xj for a random walk started at x^ ( |BremaucH 
19991). 
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We review here the concept of commute time in 
the context of a random walk on a graph. We show 
how the commute time can be computed easily from 
the eigenvectors of D~ 1//2 WD" 1//2 . Let us consider 
a random walk Z n on the connectivity graph. The 
walk starts at x^, and evolves on the graph with the 
transition probability P = D _1 W, 



J 



(3) 



If the walk is at x^, it jumps to one of the near- 
est neighbors, Xj, with probability Pij. The walk 
first visits all voxels in the same functional area be- 
fore jumping to a different functional area. Indeed, 
if voxel i and j are in the same functional area, and 
voxel k is in a different functional area, then we ex- 
pect that ||x^ — Xj|| <C ||x^ — Xfc||, and therefore 
Pi j ^> Pi^. This observation can be quantified by 
computing the average hitting time that measures 
the number of steps that it takes for the random 



walk started at x^ to hit for the first time (Bre- 



maud||1999| ) 



H(xi,x.j) = E{[Tj] with Tj = min{n > 0; Z n = j}. 

The hitting time is not symmetric, and cannot be a 
distance on the graph. A proper distance is provided 
by a symmetric version of H, called the commute 
time, (premaudjn.999), 

K,(xi,Xj) =H(x i ,x j )+H(x j ,Xi) =E i [T j ]+E j [T i \. 

(4) 

As one would expect, ft(x^Xj) increases with the 
geodesic distance £(x^,Xj). Unlike the geodesic dis- 
tance, K,(xi,Xj) decreases when the number of paths 
between the nodes increases. 
Commute time and clustering coefficient 
The commute time is greatly influenced by the rich- 
ness of the connections between any two nodes of the 
network. This concept can be quantified by the clus- 
tering coefficient. Let x^ be a node with n n neigh- 
bors. The neighbors of x^ may, or may not, be neigh- 
bors of one another. To asses the transitivity of the 
connections, we can compute the total number of 
edges, that exist between all the neighbors of 



x^. The clustering coefficient (Albert and Barabasi 



2002 ) is Ci = 2ei/[n n (n n — 1)]. The maximum value 
of Ci is 1 and is achieved when each neighbor of x^ 
is connected to all the other neighbors of x^ (they 
form a clique). If the average clustering coefficient, 
computed over all nodes of the network, is close to 
1, then there will always be multiple routes between 
any two nodes x^ and Xj , and the commute time will 




Fig. 2. The eigenvector <fi k as a function defined on the nodes 
of the graph. 



remain small. Egufluz et al. (2005) measured clus- 



tering coefficients in networks of functionally con- 
nected voxels in fMRI that were indicative of scale- 



free small- world networks. Achard et al. (2006 ) iden- 



tified networks of richly connected hubs in the cor- 
tex, and have also shown that the functional network 
(as measured by fMRI) exhibits the "small world" 
property. The commute time provides a distance on 
the graph that takes into account the abundance of 
connections that may exist between two nodes of the 
graph. 

A spectral decomposition of commute time 
As explained in the Appendix, the commute time 
can be conveniently computed from the eigenvector 
01, • • • , 4>n of the symmetric matrix D -1 / 2 PD -1 / 2 . 
The corresponding eigenvalues are between —1 and 
1, and can be labeled such that — 1<Aat---<A2< 
Ai = 1. Each eigenvector 4> k is a vector with N 
coordinates, one for each node of the graph G. We 
therefore write 



fc = [</> fc (i),0 fe (2),-- - ,0 fe (Aor 



(5) 



to emphasize the fact that we consider <p k to be a 
function defined on the nodes of the graph (Fig. [2]). 
According to (16), the commute time can be ex- 
pressed as 



N 



^( Xi , Xj ) = ^- 



k=2 



(6) 



where 7r^ = dij ■ Wij is the stationary distribu- 
tion associated with P, 7r T P = ir T . The right hand 
side of (|6|) is the sum of N — 1 squared contributions 



of the form (c/) k (i) / ^ - 4> k (j)/<fir])/y/l - \ k . 
Each contribution is the difference between two 
terms: 4> k {i) / \fih and 4>k(j)/ 'v^7> which are asso- 
ciated with nodes x^ and Xj , respectively. 
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Algorithm 1: Construction of the embedding 
Input: 

• xi(t),t = 0,... ,T-l,i = l,... ,7V, 

• a; n n ; K: dimension of the embedding, 
Algorithm: 

(i) construct the graph defined by the n n nearest 
neighbors of each 

(ii) compute W and D. Find the first K eigenvec- 
tors, (p k: of D~2WD~2 

Output: K coordinates of each x^, 

^{0 fe (i)A/r^},fc = 2,... ,k + i 

Fig. 3. Construction of the embedding 

2.4. Embedding of the dat as et 

We define a mapping from x^ to a vector of size 
N - 1, 

T 



X ? 



02(0 



<M0 



• (7) 



The idea was first proposed in (Berard et al. 



1994) to embed manifolds. Recently, the same idea 



has been revisited in the machine learning litera- 



ture JBelkin^ncr^iyogi , 2003 Coifman and La- 
fon, 2006). According to (J6|, the commute time 



is simply the Euclidean distance measured be- 
tween the new coordinates. In practice, we need 
not use all the eigenvectors in (|7|). Indeed, because 
— 1 < Ajv • • • < A2 < Ai = 1 we have — X2 > 

1 / y/1 — A3 > • • • 1/ y/1 — Aat. We can therefore ne- 
glect <j> k (j)/y/l — X k in ^ for large fc, and reduce 
the dimensionality of the embedding by using only 
the first K coordinates. Finally, we define the map 
^ from R T to R K , 



02(0 



0k+i(O 



\A - Ax+i 
(8) 

The low dimensional parametrization ([8| provides a 
good approximation when the spectral gap Ai — A2 = 
1 — A2, is large. The construction of the embedding 
is summarized in Fig. [3j Unlike PC A which gives a 
set of vectors on which to project the dataset, the 
embedding ([8| directly yields the new coordinates 
for each time series x^. The new coordinates of x^ 
are given by concatenating the i th coordinates of 
4> kl k — 2, • • • , K + 1. The first eigenvector <p is 
constant and is not used. What is the connection to 
PCA ? 



Because each <fi k is also an eigenvector of the graph 
Laplacian, L = I-D2PD"2 ; it minimizes the "dis- 
tortion" induced by (f> and measured by the Rayleigh 
ratio ( |Belkin and Niyogi[ [2003] ) , 



mm 

0,11011=1 



(9) 



where 4> k is orthogonal to the previous eigenvector 
{0o, l5 • • • , 4>k-i}- The gradient of 4> at the vertex 
x^ on the graph can be computed as a linear com- 
bination of terms of the form (<fi(i) — <f>(j)), where 
j and i are connected. Therefore the numerator of 
the Rayleigh ratio ([9j) is a weighted sum of the gra- 
dients of 4> at all nodes of the graph, and quantifies 
the average local distortion created by the map 4>. A 
function that minimizes (|9| , will still introduce some 
global distortions. Only an isometry will preserve all 
distances between the time series, and the isometry 
which is optimal for dimension reduction is given 
by PCA. However, as shown in the experiments, the 
first PCA components are usually not able to cap- 
ture the nonlinear structure formed by the set of 
time series. As a result, PCA fails to reveal the orga- 
nization of the dataset in terms of low dimensional 
activated time series and background time series. 
Thirion and Faugeras (2003) have used kernel PCA 



to analyze the distributions of the coefficients of a 
model fitted to fMRI time series. A block design 
fMRI dataset from a macaque monkey is studied. 
By visual inspection, the authors show that their 
method can organize the coefficients according to 
the relative strength of the activation. The eigen- 
vectors of the Laplacian have also been used to con- 
struct maps of spectral coherence of fMRI data in 
dThirion et"aH|2006[ ). 

How many new coordinates do we need ? 
We can estimate K, the number of coordinates in 
the embedding (J8|, by calculating the number of 
4> k needed to reconstruct the low dimensional struc- 
tures present in the dataset. As opposed to PCA, the 
embedding defined by ([8| is not designed to mini- 
mize the reconstruction error, it only minimizes the 
average local distortion ([9|. Nevertheless, we can 
take advantage of the fact that the eigenvectors {4> k } 
constitute an orthonormal basis for the set of func- 
tions defined on the vertices of the graph ( |Chung| 
1997). In particular, we make the trivial observation 
that the scan at time t, x(t) = [x\(t), • • • ,£at(£)] t , 
is a function defined on the nodes of the graph: x(t) 
at node x^ is the value of the fMRI signal at voxel z, 
Xi(t). We can therefore expand x(t) using the 4> k , 
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K 

k=i 

N 

+ ^ <x(t),0 fc >0 fc 

= x*(t) + r(t), 

where x K (t) = J]fc=i < x(t),0 fc > fc is the ap- 
proximation to the t th scan using the first K eigen- 
vectors, and r(t) is the residual error. We can com- 
pute a similar approximation for all the scans (t = 
1, • • • , T), and compute the temporal average of the 
relative approximation error at a given voxel z, 

^- ^"-f"' . do) 

Finally, one can compute the average of £i(K) over 
all a group of voxels in the same functional area 
TZ, s n (K) = (Eien^( K ))/\K We expect s n (K) 
to decay fast with K if the time series within 1Z 
are well approximated by l5 • • • , 4> K . In practice, 
for each region 1Z we find after which en stops 
having a fast decay. We then choose K the number of 
coordinates to be the largest K-jz among all regions 
1Z. Examples are shown in Fig. [9]-right, where K is 
chosen at the "knee" of the curves £n(K). 

2.5. k -means clustering 

The map \I/ defined by ([sj) provides a set of new 
coordinates, \I/(x^), for each x^. We cluster the set 
{^(x^), i = 1, • • • , N} into a small number of coher- 
ent structures and a large background component. 
We use a variation of the /c-means clustering algo- 
rithm for this task. The low dimensional parameteri- 
zation of the dataset usually has a "star" shape (Fig. 
[4]- left), where the out-stretching "arms" of the star 
are related to activated time series or strong phys- 
iological artifacts, and the center blob corresponds 
to background activity. Our goal is to segment each 
of the "arms" from the center blob. We project all 
the ^(x^) on a unit sphere centered around the ori- 
gin (Fig. fright). We then cluster the projections 
on the sphere: the distance between two points on 
the sphere is measured by their angle. The center 
component (shown in black in Fig. [2]- left) is usu- 
ally spread all over the sphere, and is mixed with 
the branches (Fig. fright). The time series from the 
background component can be separated from the 
other time series by measuring the distance of ^(x^) 
to the origin (Fig. |4j-left). The number of clusters 




Fig. 4. Left: low dimensional embedding of the dataset an- 
alyzed in section [3.2| Each dot i is ^(xi), the image of the 
time series through the mapping Right: the ^(x^) are 
projected on the sphere, and the projections are clustered 
on the sphere. 

can be chosen to be equal to K + 1. This choice 
is based on experience that each eigenvector (each 
dimension) will contribute to an independent arm, 
and the background time series will contribute to 
the last cluster. This choice may over- segment the 
dataset. This is usually obvious from the visual in- 
spection scatter plot, and the corresponding spatial 
maps. We iteratively refine the estimate of the num- 
ber of clusters, by merging small clusters at each 
iteration. 

3. Results 

In this section we describe the results of experi- 
ments conducted on synthetic and in-vivo datasets. 
We construct the embedding according to the algo- 
rithm described in Fig [3j and the clustering algo- 
rithm, described in section [23} divides the embed- 
ded dataset into coherent groups. We interpret the 
coherent structures in terms of a task-related hemo- 
dynamic response, and physiological artifacts. Vox- 
els that correspond to task-related activation are 
identified and activation maps are generated accord- 
ingly. We evaluate our approach using two differ- 
ent criteria. First, we compare the parameterization 
created by our approach with the parametrization 
produced by PCA. The comparison is based on 
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Fig. 5. Left: synthetic brain: activation (white), non-activa- 
tion (gray), outside the brain (black). Right: stimulus time 




projection onto 1 st basis 



Fig. 6. Activated (red) and background (black) time se- 
ries projected on the first two PCA components (left), and 
parametrized by \P (right). 



our ability to identify and extract well defined struc- 
tures from the new parameterization. Our second 
criterion is the comparison of the activation maps 
obtained with our approach with the ones gener- 
ated by the General Linear Model (GLM). Five 
datasets are selected for the analysis: a synthetic 
dataset, a block design dataset, an event-related 
dataset, and two datasets from the EBC competi- 



tion Uniyersity_of Pittsburgh (2007). 



3.1. Synthetic data 

The synthetic datasets were designed by blend- 
ing activation into background, non activated, time- 
series that were extracted from a real in vivo dataset 



(described in section 3.2). We discarded time series 
exhibiting large variance. Activated time-series were 
constructed by adding an activation pattern f(t) to 
the background time series. The activation pattern 
f{t) was obtained by convolving the canonical hemo- 
dynamic filter h(t) used in SPM ( |Friston| [2005 ) 



a 



-{t-d 1 )/b 1 _ r I ±_ \ p -{t-d 2 )/b 2 



h(t) 

(ii) 

with the stimulus time-series g(t) (Fig. [5| left). The 
two parameters a and b\ were randomly distributed 
according to two uniform distributions, on [0.8, 1.2] 
and on [5, 10] respectively. The other parameters 
were fixed and chosen as follows, a\ = 6, = 
12,^2 = 0.9, and c = 0.35. By varying b\ and a 
independently, we generated a family of hemody- 
namic responses with different peak dispersions and 
scales. We generated 20 independent realizations 
according to this design. Each dataset consisted of 
a white disk of activated voxels inside a circular 
gray brain of background voxels (Fig. [5]- left). Black 
voxels were in the air. There were altogether 1067 
voxels inside the circular brain, with 97 activated 






Fig. 7. Left: clustering of {^(x^),i = 1, • • • , N} into 2 clus- 
ters: activated (red) and background (black). Right: acti- 
vated (white) and background (gray) pixels. 

voxels, (9% activation). Fig. [6]- left shows the pro- 
jections on the first two principal axes of the 1067 
time series of one realization. The projections are 
color-coded according to their status: activated (red 
diamond) and background (black circle). The pa- 
rameterization given by our approach is shown in 
fright. We used only K = 2 coordinates in ([8| 
for ^. Activated time series are distributed along 
a thin strip that extends outward from the main 
cluster. This low dimensional structure is compact 
and easy to identify. In comparison, the two dimen- 
sional representation given by PCA (Fig. |6]-left) is 
less conspicuous: activated time series (red) overlap 
with background time series (black). After embed- 
ding the dataset into two dimensions, the dataset is 
partitioned into two clusters. Fig. [7} left shows the re- 
sult of clustering: the labels (red for activated, black 
for background) are based on the clustering only. 
The corresponding activation map is shown in Fig. 
[TJ-riglit. We compared our algorithm with a linear 
model equipped with the perfect knowledge of the 
hemodynamic response h(t) (with b\ = 1 and a = 
1). A Student t-test was applied to the regression 
coefficient to test its significance, and voxels with 
a p- value smaller than a threshold were considered 
activated. Fig. [5] provides a quantitative comparison 
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Fig. 8. ROC curves: comparison of our approach to the GLM. 
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Fig. 9. Three-dimensional embedding: 

{^(xi),i = l,--- ,3084}: cluster I (red), cluster II (blue), 
cluster III (green), and background (black). Time series 
marked by a circle are shown in Fig. |11| Right: residual 
error e-n{K) as a function of the number of coordinates K. 



of our approach with the linear model using a re- 
ceiver operator characteristic (ROC) curve. The 
true activation rate (one minus the type II error) is 
plotted against the type I error (false alarm rate). 
The ROC curve was computed over 20 trials. Each 
trial included different activation strengths a. In 
this experiment, the linear model has access to an 
oracle, in the form of the perfect knowledge of the 
hemodynamic response h(t), and should therefore 
perform very well. In fact, if the noise added to f(t) 
were to be white, we know from the matched filter 
theorem that the linear model would provide the 
optimal detector. Here, the noise is extracted from 



the data, and is probably not white ( Bullmore et al. 
2001). As shown in Fig. [8| our approach performs 



as well as the GLM for a type I error in the range 
[0.003,0.009]. At low type I error, our approach 
misses activations. 

3.2. In vivo data I: block design dataset 

We apply our technique to a block design dataset 
that demonstrates activation of the visual cortex 
(iTanabe et al. 2002). A flashing checkerboard im- 



age was presented to a subject for 30 seconds, and a 
blank image was presented for the next 30 seconds. 
This alternating cycle was repeated four times. Im- 
ages were acquired with a 1.5T Siemens MAGNE- 
TOM Vision equipped with a standard quadrature 
head coil and an echoplanar subsystem (TR = 3s, xy 
dimension: 128 x 128, voxel size = 1.88 x 1.88 x 3mm, 
12 contiguous slices). Eighty images were obtained. 
We analyze a volume that contains the calcicarin 



cortex (Brodman areas 17) (Fig. 13). There are al- 
together 3084 intracranial time series in the volume. 
The linear trend of each time series is removed. Fig. 
[9}left displays the image of the time series by the 




Fig. 10. Low dimensional embedding obtained by PC A (left), 
and ISOMAP (right). 
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Fig. 11. Time series from the cluster I (A), cluster II (B), 
and cluster III (C). 

embedding, ^(x^), i = 1, • • • , 3084. The time series 
are color-coded according to the result of the cluster- 
ing. The dataset is partitioned into four clusters. For 
comparison purposes, we show the embedding gen- 
erated by ISOMAP ( |Tenenbaum et al.| |2000[ ) (Fig. 
10 -left), and the embedding obtained by projecting 
the time series on the first three PCA axes (Fig. 
10 -right). ISOMAP makes it possible to reconstruct 
a low dimensional nonlinear structure embedded in 
high dimension. The algorithm computes the 
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Fig. 12. Eigenvectors 2 and 3 visualized as images. 

pairwise geodesic distance between any two points 
in the dataset, and uses multidimensional scaling to 
embed the dataset in low dimension. As explained in 
section [23} the geodesic distance is not appropriate 
for fMRI data that are very noisy. For this reason an 
embedding based on commute time is more robust. 
The representation given by PC A and ISOMAP, 



shown in Fig. 10 left and right, are less conspicuous 
than the representation obtained by our approach. 
No low dimensional structures are apparent in these 
two plots. The plot of the residual error (Fig. |9| 
right) indicates that eigenvectors 2 an d 03 create 
the largest drop in the residual energy, and should be 
the most useful. We use K = 3 coordinates: 2 , 3 
and 4 to embed the dataset. In order to interpret 
the role of the clusters, we select several time series 
from each cluster (identified by circle in the scat- 
ter plot in Fig. [9J and plot them in Fig. [TTJ The 
time series from the red cluster (Fig. [TTJ A) are typ- 
ical hemodynamic responses to a periodic stimulus. 
The time series at the tip of the red cluster, marked 
by a red circle, is the red curve in Fig. [TTJ-A. It is 
the strongest activation pattern. Times series in the 
middle of the red cluster (blue and black circles) ex- 
hibit weaker activation (blue and black curves). We 
interpret cluster I as voxels activated by the stim- 
ulus. The embedding has organized the time series 
according to the strength of the activation: strong 
activation at the tip and weak activation at the base 
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Fig. 13. Top two rows: voxels are color-coded according to 
the cluster color (except for the background cluster). We in- 
terpret cluster I (red) as voxels in the visual cortex recruited 
by the stimulus. Bottom two rows: activation maps obtained 
using the linear regression model (p = 0.001). 

of the branch (close to the background activity). 
The time series from the blue cluster have a high 
frequency (Fig.[n]-B), and are grouped together in- 
side the brain (Fig. 13). These time series could be 
related to non-task related physiological responses, 
such as a pulsating vein. Finally, the time series 
from the green cluster are less structured (Fig. [TT]), 
and are scattered across the region of analysis (Fig. 
13). We were not able to interpret the physiological 
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role of these time series. 



So what do the eigenvectors look like ? 
As explained in section [2731 we consider the eigen- 
vectors <fi k to be functions of the nodes i of the 
graph. We can therefore represent <fi k as an image: 
each voxel is color-coded according to the value of 
4> k (i). The majority of the values of 4> 2 are positive 
(deep red, in Fig. [l2]-left) . A few voxels take neg- 
ative values (yellow and cyan in Fig. [l2}left). The 
nodal lines (where <fi 2 changes sign) are localized 
around the area of activation in the visual cortex. 
We can check in Fig.[9]that the activated time series 
(red cluster) have a negative 4> 2 coordinate. In fact, 
4> 2 is known as the Fiedler vector ( |Chung[ [1997) 
and is used to optimally split a dataset into two 
parts. (Fig. [l2|). We computed the activation map 
obtained using a GLM. The regressor is computed 
by convolving the haemodynamic response defined 



by (11) with the experimental paradigm. The acti- 
vation map (thresholded at p = 0.001) is shown in 
Fig. [l3j and is consistent with the activation maps 
obtained by our approach. 



3.3. In vivo data II: event-related dataset 



0.15- 
1- 
0.05- 









» 


























UIUS 




Fig. 14. Three-dimensional embedding. Left: one-trial (blue) 
and two-trial (red) conditions are well separated. Right: clus- 
ter I (blue), cluster II (red) and background (green). Time 
series marked by a circle are shown in Fig. |16| 



We apply our method to an event-related dataset. 



Buckner et al. ( 2000 ) used fMRI to study age-related 



changes in functional anatomy. The subjects were 
instructed to press a key with their right index finger 
upon the visual stimulus onset. The stimulus lasted 
for 1.5s. Functional images were collected using a 
Siemens 1.5-T Vision System with an asymmetric 
spin-echo sequence sensitive to BOLD contrast (vol- 
ume TR = 2.68s, xy dimension: 64 x 64, voxel size: 
3.75 x 3.75mm, 16 contiguous axial slices). Each run 
consists of 128 TRs. For every 8 images, the subjects 
were presented with one of the two conditions: (i) 
the one-trial condition where a single stimulus was 
presented to the subject, and (ii) the two-trial condi- 
tion where two consecutive stimuli were presented. 
The inter-stimulus interval of 5.36s. was sufficiently 
large to guarantee that the overall response would 
be about twice as large as the response to the one- 
trial condition. We analyzed one run. After discard- 
ing the first and last four scans, the run included 
15 trials (8 one-trial and 7 two trial conditions) of 
8 temporal samples. Time series from the one-trial 
and two-trial conditions were averaged separately. 
Therefore, each voxel gave rise to two average time 
series of 8 samples. The linear trend was removed 





Fig. 15. Low dimensional embedding obtained by PC A (left), 
and ISOMAP (right). One-trial condition (blue) and two- 
trial condition (red) are mingled. 
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from all average time series. The results published in 



(Buckner et al. 2000) show activation in the visual 



cortex, motor cortex, and cerebellum. We focus our 
analysis on four contiguous axial slices (7, 8, 9 and 
10) that extend from the superior caudate nucleus 
to the midlevel diencephalon. We extract a region 
(1025 intracranial voxels) that extends from the oc- 
cipital posterior horn of the lateral ventricles until 
the end of the occipital lobe. The total number of 
time series included in our analysis is 2050: two time 
series (one-trial and two-trial conditions) of T = 8 
samples for each voxel. 
Embedding the dataset in 3 dimensions 



We display in Fig. 14 -left the 2050 time series af- 
ter embedding the dataset in three dimension. The 
one-trial condition time series (blue) form a blob at 
the center. The two-trial (red) conditions time series 
forma "V" (Fig. fright). Both branches of the "V" 
are nearly one dimensional and are aligned with <p 2 
and 4 . The branch aligned with <p 2 also includes 
one-trial condition time series at the base of the 
branch. The branch aligned with 4 contains only 
two-trial condition time series. The dataset was par- 
titioned into three clusters (Fig.[l4]-right). We com- 
pare our embedding to the embedding generated by 
ISOMAP (Fig.[l5}left) and PC A (Fig. fright). No 



low-dimensional structures emerge from the repre- 
sentations given by PCA or ISOMAP, and two-trial 
and one-trial time series are mixed together. The 
eigenvectors 4> 2 and 4 create the largest drop in the 
residual energy, and are the most useful coordinates. 
We use K = 3 coordinates: 2 , 3 and <fi 4 to embed 
the dataset. To determine the role of the clusters, 
we select three groups of four time series (identified 
by circles in Fig. 14 right) and we plot them in Fig. 



[16) Time series from cluster I all have an abrupt dip 
at t = 7. The corresponding voxels are located along 



the border of the brain (Fig. 17). The original time 
series (before averaging) suffer from a sudden drop 
at time 95, which could be caused by a motion arti- 
fact, that affects the average time series. There are 
two groups of time series selected from cluster II: two 
two-trial condition time series located at the tip of 
the branch, and two one-trial condition time series 
located at the border with the background cluster. 
Time series from cluster II have a shape similar to an 



hemodynamic response (Fig. 16 -B and C), and the 
corresponding voxels are located in the visual cortex 



(Fig. 17). Therefore, we hypothesize that cluster II 
contains times series recruited by the stimulus. In- 
terestingly, the embedding has organized the time 
series along the branch 




Fig. 16. Four representative time series from cluster I (A) , 
cluster II with one-trial condition (B), and from cluster II 
with two-trial condition (C). 

of cluster II according to the strength of the activa- 
tion: from two-trial condition (strong response) at 
the tip, to one-trial condition (weak response) at the 
stem (close to the background time series). This is 
a remarkable result since no information about the 
stimulus, or the type of trial was provided to the 



algorithm. Fig. 17 shows the location of the voxels 
corresponding to the time series of cluster I (blue) 
and II (red) . For comparison purposes we computed 
the activation map obtained using the GLM. The 
averaged time series from the two-trial condition are 
used for the regression analysis. We use the hemody- 



namic re sponse function defined by (|Dale and Buck- 



ner 



1997), h{t) = ((t - 5)/r) 2 e^- h ^ r , where 5 



2.5, r = 1.5. The regressor was given by g(t) = 
h(t) * s(t), where s(t) is the stimulus time series. We 
thresholded the p- value at p = 0.005, and the activa- 



tion maps are shown in Fig 17 The activation maps 
constructed by our approach are consistent with the 
maps obtained with a GLM. 

3.4. Complex natural stimuli 

We demonstrate here that our method can be 
used to analyze fMRI datasets collected in natu- 
ral environments where the subjects are bombarded 
with a multitude of uncontrolled stimuli that can- 



not be quantified (Golland et al. , 2007 U.Hason 



20071 Meyer and Stephens, 2008). During such ex- 



etaTl [20041 |Haynes and Rees| |2006| |Malinen eTal 



periments, the subjects are submitted to an abun- 
dance of concurrent sensory stimuli, which makes 
the analysis with inferential methods impossible. 
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Fig. 17. Top two rows: activation maps: cluster I (blue) , 
cluster II (red). Bottom two rows: activation maps obtained 
using the linear regression model (p = 0.005). 

Exploratory techniques can help unravel the dif- 
ferent neural processes involved during the experi- 



ments (Malinen et al. , 2007). Unlike ICA, our ap- 



proach does not posit the existence of a mixture 
model. Our approach merely explores the connec- 
tivity in the dataset, and proposes a new parame- 
terization that preserves connectivity. 

The Experience Based Cognition competition 



(EBC) (University of Pittsburgh 2007) offers an 



opportunity to study complex responses to natural 
environments. The EBC datasets comprise three 
20-minute runs (704 TRs in each run) of subjects 



interacting in an urban virtual reality environ- 
ment. Subjects were audibly instructed to complete 
three search tasks in the environment: looking for 
weapons (but not tools) taking pictures of peo- 
ple with piercing (but not others), or picking up 
fruits (but not vegetables). The data was collected 
with a 3T EPI scanner (TR = 1.75s, xy dimension: 
64 x 64, voxel size = 3.28 x 3.28mm, 34 slices with 
a thickness of 3.5mm). We analyze the second runs 
of subjects 14 and 13. For each subject, the matrix 
X is composed of N = 4843 intra-cranial voxels at 
T = 704 TRs. We first remove the non regionally 
specific variance captured by the first eigenmodes 
of a singular value decomposition of the dataset. 
We then compute k = 2, • • • 10 using n n = 100 
and a = 2 x min^j ||x^ — Xj||. After embedding the 
dataset into four dimensions, we cluster the voxels. 



Figs. 18 and 20 display the datasets after embed- 
ding. Because we cannot display four dimensions, 
we show the projections of the dataset on three 
consecutive coordinates. All the coordinates con- 
tribute to the spread the dataset along elongated 
arms, which facilitates the clustering. Voxels that 
do not correspond to the background activity (the 



maroon cluster in Figs. 18 and 20) are superim- 
posed on anatomically registered structural images 
and colored according to their cluster label (see 
Figs. 19 and 21). For both subjects, the clusters are 



connected regions (see Figs. 19 and 21), compactly 
organized around functional areas related to the 
processing of visual, and auditory stimuli (music, 
cellphone ringing, dog roaring) in the virtual en- 
vironment. It is important to emphasize that our 
method never enforces any form of spatial proxim- 
ity, and is purely based on functional connectivity. 



For subject 14 (Fig. 19), the orange cluster corre- 
sponds to activation in the calcarine cortex associ- 
ated with VI / V2 representations of the lower visual 
fields, while the light blue cluster corresponds to rep- 
resentations of the upper visual fields. Activation in 
lateral areas (visual motion areas, MT/V5) is also 
present, as well as activity in the posterior convexial 
cortex (area VP). The activation is predominantly 
in the right hemisphere. Interestingly, the two clus- 
ters located in the visual cortex (light blue and or- 
ange) have very similar (f) 3 and 4 coordinates (see 
Fig.[l8}left). The cyan cluster corresponds to acti- 
vation in the right frontal gyrus (Broca area) associ- 
ated with language comprehension. The yellow clus- 
ters are located in the right and left superior tempo- 
ral gyri and medial temporal gyri (Wernicke area). 
These regions correlate with activation 
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Fig. 18. Subject 14: views of the embedded dataset using 
03> 04> ^5 ( left )> 4>Ai<t>bi ^6 (center), and </> 5 , </> 6 , 3 (right). 
The labels are obtained after clustering the datasets into 
six clusters. The background activity (maroon cluster) is 
centered around 0. 

in the auditory cortex and language areas. Finally, 
the dark blue cluster corresponds to activation in the 
prefrontal cortex. A very similar pattern of activity 



(Fig. 21) was obtained for subject 13. The blue and 
orange clusters, located in the calcarine cortex, cor- 
respond to VI and V2 areas. Again, these two clus- 
ters, both located in the visual cortex, have similar 
02 and 4>q coordinates. The green cluster is located 
in the medial temporal gyrus (Wernicke area) and 
is associated with language processing. 

We replaced the embedding constructed by our 
method with the parametrization produced by PC A, 
using the same pre-processing steps. PCA was 







Fig. 19. Subject 14. Top: V1/V2 (orange); representations of 
the upper visual fields (light blue); Broca area (cyan). Bot- 
tom: Wernicke area (yellow); prefrontal cortex (dark blue). 

unable to produce any meaningful activation maps 
(results not shown). In fact, the clustering would 
usually not converge. Interestingly, the activation 
maps obtained with these natural stimuli are very 
similar to the extrinsic network, found by Golland 
et al. (2007), that is composed of areas dedicated 



to the processing of sensory information: auditory, 
visual, somatosensory and language. 

4. Discussion 

We proposed a new method to compute a low di- 
mensional embedding of an fMRI dataset. The em- 
bedding preserves the local functional connectivity 
between voxels, and can be used to cluster the time 
series into coherent groups. Our approach, based on 
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Fig. 20. Subject 13: views of the embedded dataset using 
02> 03> 06 ( left )> 03> 06> 08 (center), and </> 6 , </> 8 , 2 (right). 
The labels are obtained after clustering the datasets into 
five clusters. The background activity (maroon cluster) is 
centered around 0. 

a spectral decomposition of commute time, appears 
to be more robust than a method based on the com- 
putation of the geodesic distance between time series 



(Tenenbaum et al. 2000b. Our approach is able to 



detect independently visual areas (VI /V2, V5 /MT) , 
auditory and language areas that are recruited when 
subjects interacted in an urban virtual reality en- 
vironment. We believe that this approach offers a 
new approach for the analysis of natural stimuli. 
The method still requires the visual inspection of 
the spatial patterns of activations. This is a stan- 
dard limitation of exploratory methods. We are cur- 
rently exploring methods to combine our approach 
with standard functional atlases. 
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Fig. 21. Subject 13. Top: Wernicke area (green). Bottom: 
V1/V2 (orange); representations of the upper visual fields 
(light blue); lateral areas (light blue): visual motion areas 
(MT/V5). 



Choice of the parameters 

There are only two parameters that determine the 
embedding: n n the number of nearest neighbors and 
the scaling factor a. We described in section [2^2] a 
universal choice for a. The number of nearest neigh- 
bors can also be chosen according to a universal 
strategy, n n can be assigned to the largest power of 
ten smaller than the number of time samples (TRs) 
in the time series. For instance, for the datasets for 
the event-related and block design paradigm respec- 
tively we used n n = 7 and n n — 9 . The analysis 
of the EBC datasets that contain 704 TRs each re- 
quired a much larger number of nearest neighbors: 
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n n = 100. 

Computational complexity 

The complexity of our method is determined by 
the combined complexity of the nearest neighbor 
search and the eigenvalue problem. We use a brute 
force approach to search for nearest neighbors. We 
use the restarted Arnoldi method for sparse ma- 
trices implemented by the Matlab function eigs 
to solve the eigenvalue problem. The algorithm re- 
quires NO(K) + 0(K 2 ) storage, and 0(NK 2 ) + 
0(K 3 ) computations. Our MATLAB code will be 
made available shortly from our webpage. 



graph Laplacian. We can consider the eigenvectors 
<t>i , • • • , <p T of the symmetric matrix 



D2PD-2, 



(15) 



and write P in ( 14) in terms of the eigenvectors. We 
obtain 



«(<.;) = ErV(*#-^) • (16) 

Finally, we note that D^PD 1 / 2 = D^WD" 1 / 2 . 
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5. Appendix 

The hitting time satisfies the following one-step 
equation, 



E i [T j ] = l + Yl p i,kEi[T k }. 

k;k^j 



(12) 



Iterating this equation yields an expression of Ei [Tj] 
in terms of powers of P. Let us define the fundamen- 
tal matrix (BremaudJ [1999), 



Z = / + ^P* 



n. 



k>l 



then we have, 



Ei[Tj] = (Zjj - Z i:j )/7Vj. 



(13) 



(14) 



The proof is a straightforward consequence of (12), 
and can be found in (Bremaud, 1999[ ). Note that 
Z = (I— (P — II)) -1 is also the Green function of the 
Laplacian, which explains the connection with the 
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